Accelerating communication for parallel programming models on GPU systems

نویسندگان

چکیده

As an increasing number of leadership-class systems embrace GPU accelerators in the race towards exascale, efficient communication data is becoming one most critical components high-performance computing. For developers parallel programming models, implementing support for GPU-aware using native APIs GPUs such as CUDA can be a daunting task it requires considerable effort with little guarantee performance. In this work, we demonstrate capability Unified Communication X (UCX) framework to compose layer that serves multiple models Charm++ ecosystem: Charm++, Adaptive MPI (AMPI), and Charm4py. We performance impact our designs microbenchmarks adapted from OSU benchmark suite, obtaining improvements latency up 10.1x 11.7x AMPI, 17.4x also observe increases bandwidth 10x 10.5x show potential on real-world applications by evaluating proxy application Jacobi iterative method, improving 12.4x 12.8x 19.7x

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerating parallel particle swarm optimization via GPU

This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date...

متن کامل

Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems

We present a review of the current best practices in parallel programming models for dense linear algebra (DLA) on heterogeneous architectures. We consider multicore CPUs, stand alone manycore coprocessors, GPUs, and combinations of these. Of interest is the evolution of the programming models for DLA libraries – in particular, the evolution from the popular LAPACK and ScaLAPACK libraries to th...

متن کامل

GPU-Vote: A Framework for Accelerating Voting Algorithms on GPU

Voting algorithms, such as histogram and Hough transforms, are frequently used algorithms in various domains, such as statistics and image processing. Algorithms in these domains may be accelerated using GPUs. Implementing voting algorithms efficiently on a GPU however is far from trivial due to irregularities and unpredictable memory accesses. Existing GPU implementations therefore target only...

متن کامل

Geometric Programming for Communication Systems Geometric Programming for Communication Systems

Geometric Programming (GP) is a class of nonlinear optimization withmany useful theoretical and computational properties. Over the last fewyears, GP has been used to solve a variety of problems in the analysisand design of communication systems in several ‘layers’ in the commu-nication network architecture, including information theory problems,signal processing algorithms, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Parallel Computing

سال: 2022

ISSN: ['1872-7336', '0167-8191']

DOI: https://doi.org/10.1016/j.parco.2022.102969